智能论文笔记

Mix-Teaching: A Simple, Unified and Effective Semi-Supervised Learning Framework for Monocular 3D Object Detection

Lei Yang , Xinyu Zhang , Li Wang , Minghan Zhu , Chuang Zhang , Jun Li

分类：计算机视觉

2022-07-10

单眼3D对象检测是自动驾驶的重要感知任务。但是，对大型标记数据的高度依赖使其在模型优化过程中昂贵且耗时。为了减少对人类注释的过度依赖，我们提出了混合教学，这是一个有效的半监督学习框架，适用于在训练阶段采用标签和未标记的图像。教学首先通过自我训练生成用于未标记图像的伪标记。然后，通过将实例级图像贴片合并到空背景或标记的图像中，对学生模型进行了更密集和精确的标签的混合图像训练。这是第一个打破图像级限制并将高质量的伪标签从多帧放入一个图像进行半监督训练的图像。此外，由于置信度评分和本地化质量之间的错位，很难仅使用基于置信度的标准将高质量的伪标签与嘈杂的预测区分开。为此，我们进一步引入了一个基于不确定性的过滤器，以帮助选择可靠的伪框来进行上述混合操作。据我们所知，这是单眼3D对象检测的第一个统一SSL框架。在KITTI数据集上的各种标签比下，混合教学始终通过大幅度的边缘改善了单支持者和GUPNET。例如，我们的方法在仅使用10％标记的数据时，在验证集上对GUPNET基线的改进约为 +6.34％ap@0.7。此外，通过利用完整的训练套件和Kitti的另外48K RAW图像，它可以进一步提高单声道 +4.65％的ap@0.7，以提高汽车检测，达到18.54％ap@0.7基于Kitti测试排行榜的方法。代码和预估计的模型将在https://github.com/yanglei18/mix-teaching上发布。

translated by 谷歌翻译

On-Device Training Under 256KB Memory

Ji Lin , Ligeng Zhu , Wei-Ming Chen , Wei-Chen Wang , Chuang Gan , Song Han

分类：计算机视觉

2022-06-30

在设备训练中，该模型可以通过微调预训练的模型来适应从传感器中收集的新数据。但是，对于具有少量内存资源的物联网设备，训练记忆消耗是过敏的。我们提出了一个算法 - 系统共同设计框架，以便仅使用256KB的内存使设备训练成为可能。在设备训练面临两个独特的挑战：（1）由于比特精确的混合和缺乏归一化而难以优化神经网络的量化图；（2）有限的硬件资源（内存和计算）不允许完整的向后计算。为了应对优化难度，我们提出了量化缩放量表来校准梯度尺度并稳定量化训练。为了减少内存足迹，我们提出稀疏更新，以跳过不太重要的层和子张量的梯度计算。该算法创新是由轻量级训练系统（小型训练引擎）实现的，该系统可修剪向后的计算图，以支持稀疏更新并卸载运行时自动分化以编译时间。我们的框架是第一个实用解决方案，用于在微型IoT设备上进行视觉识别的设备转移学习（例如，仅具有256KB SRAM的微控制器），使用少于1/100的现有框架的存储器，同时匹配云训练的准确性+Tinyml应用程序VWW的边缘部署。我们的研究使IoT设备不仅可以执行推理，还可以不断适应新的数据，以实现终身学习。

translated by 谷歌翻译

Hard Sample Aware Noise Robust Learning for Histopathology Image Classification

Chuang Zhu , Wenkai Chen , Ting Peng , Ying Wang , Mulan Jin

分类：人工智能 | 计算机视觉 | 机器学习

2021-12-05

基于深度学习的组织病理学图像分类是帮助医生提高癌症诊断的准确性和迅速性的关键技术。然而，在复杂的手动注释过程中，嘈杂的标签通常是不可避免的，因此误导了分类模型的培训。在这项工作中，我们介绍了一种用于组织病理学图像分类的新型硬样本感知噪声稳健学习方法。为了区分来自有害嘈杂的内容漏洞，我们通过使用样本培训历史来构建一个简单/硬/噪声（EHN）检测模型。然后，我们将EHN集成到自动训练架构中，通过逐渐校正降低噪声速率。通过获得的几乎干净的数据集，我们进一步提出了一种噪声抑制和硬增强（NSHE）方案来训练噪声鲁棒模型。与以前的作品相比，我们的方法可以节省更多清洁样本，并且可以直接应用于实际嘈杂的数据集场景，而无需使用清洁子集。实验结果表明，该方案在合成和现实世界嘈杂的数据集中优于当前最先进的方法。源代码和数据可在https://github.com/bupt-ai-cz/hsa-nrl/处获得。

translated by 谷歌翻译

Construct Informative Triplet with Two-stage Hard-sample Generation

Chuang Zhu , Zheng Hu , Huihui Dong , Gang He , Zekuan Yu , Shangshang Zhang

分类：计算机视觉

2021-12-04

在本文中，我们提出了一种强大的样本生成方案来构建信息性三联网。所提出的硬样品生成是一种两级合成框架，通过两个阶段的有效正和负样品发生器产生硬样品。第一阶段将锚定向对具有分段线性操作，通过巧妙地设计条件生成的对抗网络来提高产生的样本的质量，以降低模式崩溃的风险。第二阶段利用自适应反向度量约束来生成最终的硬样本。在几个基准数据集上进行广泛的实验，验证了我们的方法比现有的硬样生成算法达到卓越的性能。此外，我们还发现，我们建议的硬样品生成方法结合现有的三态挖掘策略可以进一步提高深度度量学习性能。

translated by 谷歌翻译

Predicting Axillary Lymph Node Metastasis in Early Breast Cancer Using Deep Learning on Primary Tumor Biopsy Slides

Feng Xu , Chuang Zhu , Wenqi Tang , Ying Wang , Yu Zhang , Jie Li , Hongchuan Jiang , Zhongyue Shi , Jun Liu , Mulan Jin

分类：计算机视觉

2021-12-04

目的：开发和验证基于临床阴性ALN的早期乳腺癌（EBC）术后预测腋窝淋巴结（ALN）转移的深度学习（DL）的主要肿瘤活检签名。方法：从2010年5月到2020年5月，共注册了1,058名具有病理证实ALN状态的eBC患者。基于关注的多实例学习（AMIL）框架，建立了一种DL核心针活检（DL-CNB）模型利用DL特征预测ALN状态，该DL特征从两位病理学家注释的乳腺CNB样本的数字化全幻灯片（WSIS）的癌症区域提取。分析了准确性，灵敏度，特异性，接收器操作特征（ROC）曲线和ROC曲线（AUC）下的区域进行评估，评估我们的模型。结果：具有VGG16_BN的最佳性DL-CNB模型作为特征提取器实现了0.816的AUC（95％置信区间（CI）：0.758,0.865），以预测独立测试队列的阳性Aln转移。此外，我们的模型包含称为DL-CNB + C的临床数据，得到了0.831的最佳精度（95％CI：0.775,0.878），特别是对于50岁以下的患者（AUC：0.918,95％CI： 0.825,0.971）。 DL-CNB模型的解释表明，最高度预测ALN转移的顶部签名的特征在于包括密度（$ P $ 0.015），周长（$ P $ 0.009），循环（$ P $ = 0.010）和方向（$ p $ = 0.012）。结论：我们的研究提供了一种基于DL的基于DL的生物标志物在原发性肿瘤CNB上，以预先验证EBC患者的术前预测ALN的转移状态。

translated by 谷歌翻译

Sample Prior Guided Robust Model Learning to Suppress Noisy Labels

Wenkai Chen , Chuang Zhu , Yi Chen

分类：计算机视觉 | 机器学习

2021-12-02

不完美的标签在现实世界数据集中无处不在，严重损害了模型性能。几个最近处理嘈杂标签的有效方法有两个关键步骤：1）将样品分开通过培训丢失，2）使用半监控方法在错误标记的集合中生成样本的伪标签。然而，由于硬样品和噪声之间的类似损失分布，目前的方法总是损害信息性的硬样品。在本文中，我们提出了PGDF（先前引导的去噪框架），通过生成样本的先验知识来学习深层模型来抑制噪声的新框架，这被集成到分割样本步骤和半监督步骤中。我们的框架可以将更多信息性硬清洁样本保存到干净标记的集合中。此外，我们的框架还通过抑制当前伪标签生成方案中的噪声来促进半监控步骤期间伪标签的质量。为了进一步增强硬样品，我们在训练期间在干净的标记集合中重新重量样品。我们使用基于CiFar-10和CiFar-100的合成数据集以及现实世界数据集WebVision和服装1M进行了评估了我们的方法。结果表明了最先进的方法的大量改进。

translated by 谷歌翻译

Unsupervised Domain Adaptation Through Transferring both the Source-Knowledge and Target-Relatedness Simultaneously

Qing Tian , Yanan Zhu , Chuang Ma , Meng Cao

分类：机器学习 | (统计)机器学习

2020-03-18

无监督的域适应（UDA）是机器学习和模式识别领域的新兴的研究主题，其旨在通过从源域传输知识来帮助学习未标记的目标域。

translated by 谷歌翻译

Mod-Squad: Designing Mixture of Experts As Modular Multi-Task Learners

Zitian Chen , Yikang Shen , Mingyu Ding , Zhenfang Chen , Hengshuang Zhao , Erik Learned-Miller , Chuang Gan

分类：计算机视觉 | 人工智能 | 机器学习

2022-12-15

Optimization in multi-task learning (MTL) is more challenging than single-task learning (STL), as the gradient from different tasks can be contradictory. When tasks are related, it can be beneficial to share some parameters among them (cooperation). However, some tasks require additional parameters with expertise in a specific type of data or discrimination (specialization). To address the MTL challenge, we propose Mod-Squad, a new model that is Modularized into groups of experts (a 'Squad'). This structure allows us to formalize cooperation and specialization as the process of matching experts and tasks. We optimize this matching process during the training of a single model. Specifically, we incorporate mixture of experts (MoE) layers into a transformer model, with a new loss that incorporates the mutual dependence between tasks and experts. As a result, only a small set of experts are activated for each task. This prevents the sharing of the entire backbone model between all tasks, which strengthens the model, especially when the training set size and the number of tasks scale up. More interestingly, for each task, we can extract the small set of experts as a standalone model that maintains the same performance as the large model. Extensive experiments on the Taskonomy dataset with 13 vision tasks and the PASCAL-Context dataset with 5 vision tasks show the superiority of our approach.

translated by 谷歌翻译

Physically Plausible Animation of Human Upper Body from a Single Image

Ziyuan Huang , Zhengping Zhou , Yung-Yu Chuang , Jiajun Wu , C. Karen Liu

分类：计算机视觉 | 人工智能 | 机器人

2022-12-09

We present a new method for generating controllable, dynamically responsive, and photorealistic human animations. Given an image of a person, our system allows the user to generate Physically plausible Upper Body Animation (PUBA) using interaction in the image space, such as dragging their hand to various locations. We formulate a reinforcement learning problem to train a dynamic model that predicts the person's next 2D state (i.e., keypoints on the image) conditioned on a 3D action (i.e., joint torque), and a policy that outputs optimal actions to control the person to achieve desired goals. The dynamic model leverages the expressiveness of 3D simulation and the visual realism of 2D videos. PUBA generates 2D keypoint sequences that achieve task goals while being responsive to forceful perturbation. The sequences of keypoints are then translated by a pose-to-image generator to produce the final photorealistic video.

translated by 谷歌翻译

EURO: ESPnet Unsupervised ASR Open-source Toolkit

Dongji Gao , Jiatong Shi , Shun-Po Chuang , Leibny Paola Garcia , Hung-yi Lee , Shinji Watanabe , Sanjeev Khudanpur

分类：自然语言处理

2022-11-30

This paper describes the ESPnet Unsupervised ASR Open-source Toolkit (EURO), an end-to-end open-source toolkit for unsupervised automatic speech recognition (UASR). EURO adopts the state-of-the-art UASR learning method introduced by the Wav2vec-U, originally implemented at FAIRSEQ, which leverages self-supervised speech representations and adversarial training. In addition to wav2vec2, EURO extends the functionality and promotes reproducibility for UASR tasks by integrating S3PRL and k2, resulting in flexible frontends from 27 self-supervised models and various graph-based decoding strategies. EURO is implemented in ESPnet and follows its unified pipeline to provide UASR recipes with a complete setup. This improves the pipeline's efficiency and allows EURO to be easily applied to existing datasets in ESPnet. Extensive experiments on three mainstream self-supervised models demonstrate the toolkit's effectiveness and achieve state-of-the-art UASR performance on TIMIT and LibriSpeech datasets. EURO will be publicly available at https://github.com/espnet/espnet, aiming to promote this exciting and emerging research area based on UASR through open-source activity.

translated by 谷歌翻译